MIXED_LINEAR_MODEL
Overview
The MIXED_LINEAR_MODEL function fits a Linear Mixed Effects Model (LMM), also known as a multilevel or hierarchical linear model, for analyzing data with grouped or nested structures. These models are essential when observations are not independent, such as in longitudinal studies, repeated measures designs, or when data are naturally clustered (e.g., students within schools, patients within clinics).
Unlike standard linear regression, which assumes all observations are independent, linear mixed models account for correlation within groups by incorporating both fixed effects (population-level parameters shared across all groups) and random effects (group-specific deviations from the population mean). This implementation uses the MixedLM class from the statsmodels library.
The general model for group i can be expressed as:
Y_i = X_i\beta + Z_i\gamma_i + \epsilon_i
where Y_i is the response vector, X_i is the fixed effects design matrix, \beta contains the fixed effects coefficients, Z_i is the random effects design matrix, \gamma_i \sim N(0, \Psi) is the random effects vector specific to group i, and \epsilon_i \sim N(0, \sigma^2 I) represents the residual errors.
The function supports random intercept models (when x_random is omitted), which allow each group to have its own baseline level, and random slope models (when x_random is provided), which allow both intercepts and covariate slopes to vary across groups. Parameters are estimated using Maximum Likelihood (ML) estimation, with the implementation following the Newton-Raphson and EM algorithms described by Lindstrom and Bates (1988).
The function returns fixed effects coefficients with standard errors, z-statistics, p-values, and confidence intervals, along with random effects variance components, log-likelihood, AIC, and BIC for model comparison. For more details, see the statsmodels Linear Mixed Effects Models documentation.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=MIXED_LINEAR_MODEL(y, x, groups, x_random, fit_intercept, alpha)
y(list[list], required): Column vector of dependent variable values (numeric).x(list[list], required): Matrix of fixed effects predictors. Each column is a predictor.groups(list[list], required): Column vector of group/cluster membership identifiers (numeric, integer coded).x_random(list[list], optional, default: null): Matrix of random effects predictors. If omitted, uses random intercept only.fit_intercept(bool, optional, default: true): Whether to include a fixed intercept term.alpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).
Returns (list[list]): 2D list with mixed model results, or error message string.
Examples
Example 1: Linear mixed model with random intercept
Inputs:
| y | x | groups |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 1 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
Excel formula:
=MIXED_LINEAR_MODEL({1;2;3;4;5;6}, {1;2;3;1;2;3}, {1;1;1;2;2;2})
Expected output:
"non-error"
Example 2: Linear mixed model with three groups
Inputs:
| y | x | groups |
|---|---|---|
| 2 | 1 | 1 |
| 4 | 2 | 1 |
| 6 | 3 | 1 |
| 3 | 1 | 2 |
| 5 | 2 | 2 |
| 7 | 3 | 2 |
| 4 | 1 | 3 |
| 6 | 2 | 3 |
| 8 | 3 | 3 |
Excel formula:
=MIXED_LINEAR_MODEL({2;4;6;3;5;7;4;6;8}, {1;2;3;1;2;3;1;2;3}, {1;1;1;2;2;2;3;3;3})
Expected output:
"non-error"
Example 3: Linear mixed model with 90% confidence intervals
Inputs:
| y | x | groups | alpha |
|---|---|---|---|
| 10 | 1 | 1 | 0.1 |
| 12 | 2 | 1 | |
| 14 | 3 | 1 | |
| 11 | 1 | 2 | |
| 13 | 2 | 2 | |
| 15 | 3 | 2 |
Excel formula:
=MIXED_LINEAR_MODEL({10;12;14;11;13;15}, {1;2;3;1;2;3}, {1;1;1;2;2;2}, 0.1)
Expected output:
"non-error"
Example 4: Linear mixed model with larger dataset
Inputs:
| y | x | groups |
|---|---|---|
| 5 | 1 | 1 |
| 7 | 2 | 1 |
| 9 | 3 | 1 |
| 11 | 4 | 1 |
| 6 | 1 | 2 |
| 8 | 2 | 2 |
| 10 | 3 | 2 |
| 12 | 4 | 2 |
Excel formula:
=MIXED_LINEAR_MODEL({5;7;9;11;6;8;10;12}, {1;2;3;4;1;2;3;4}, {1;1;1;1;2;2;2;2})
Expected output:
"non-error"
Python Code
import math
import warnings
import numpy as np
import pandas as pd
from statsmodels.regression.mixed_linear_model import MixedLM as statsmodels_MixedLM
def mixed_linear_model(y, x, groups, x_random=None, fit_intercept=True, alpha=0.05):
"""
Fits a Linear Mixed Effects Model (LMM) with random intercepts and slopes.
See: https://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLM.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Column vector of dependent variable values (numeric).
x (list[list]): Matrix of fixed effects predictors. Each column is a predictor.
groups (list[list]): Column vector of group/cluster membership identifiers (numeric, integer coded).
x_random (list[list], optional): Matrix of random effects predictors. If omitted, uses random intercept only. Default is None.
fit_intercept (bool, optional): Whether to include a fixed intercept term. Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.
Returns:
list[list]: 2D list with mixed model results, or error message string.
"""
def to2d(arr):
return [[arr]] if not isinstance(arr, list) else arr
def validate_2d_numeric(arr, name):
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if not arr:
return f"Invalid input: {name} cannot be empty."
for i, row in enumerate(arr):
if not isinstance(row, list):
return f"Invalid input: {name} row {i} must be a list."
if not row:
return f"Invalid input: {name} row {i} cannot be empty."
for j, val in enumerate(row):
if not isinstance(val, (int, float)):
return f"Invalid input: {name}[{i}][{j}] must be numeric."
if math.isnan(val) or math.isinf(val):
return f"Invalid input: {name}[{i}][{j}] must be finite."
return None
# Normalize inputs to 2D lists
y = to2d(y)
x = to2d(x)
groups = to2d(groups)
if x_random is not None:
x_random = to2d(x_random)
# Validate inputs
err = validate_2d_numeric(y, "y")
if err:
return err
err = validate_2d_numeric(x, "x")
if err:
return err
err = validate_2d_numeric(groups, "groups")
if err:
return err
if x_random is not None:
err = validate_2d_numeric(x_random, "x_random")
if err:
return err
# Check dimensions
n_obs = len(y)
if len(x) != n_obs:
return "Invalid input: y and x must have the same number of rows."
if len(groups) != n_obs:
return "Invalid input: y and groups must have the same number of rows."
if x_random is not None and len(x_random) != n_obs:
return "Invalid input: y and x_random must have the same number of rows."
# Check column consistency
n_y_cols = len(y[0])
for i, row in enumerate(y):
if len(row) != n_y_cols:
return f"Invalid input: y row {i} has {len(row)} columns, expected {n_y_cols}."
if n_y_cols != 1:
return "Invalid input: y must be a column vector (single column)."
n_x_cols = len(x[0])
for i, row in enumerate(x):
if len(row) != n_x_cols:
return f"Invalid input: x row {i} has {len(row)} columns, expected {n_x_cols}."
n_groups_cols = len(groups[0])
for i, row in enumerate(groups):
if len(row) != n_groups_cols:
return f"Invalid input: groups row {i} has {len(row)} columns, expected {n_groups_cols}."
if n_groups_cols != 1:
return "Invalid input: groups must be a column vector (single column)."
if x_random is not None:
n_xr_cols = len(x_random[0])
for i, row in enumerate(x_random):
if len(row) != n_xr_cols:
return f"Invalid input: x_random row {i} has {len(row)} columns, expected {n_xr_cols}."
# Validate scalar parameters
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
if not isinstance(alpha, (int, float)):
return "Invalid input: alpha must be numeric."
if math.isnan(alpha) or math.isinf(alpha):
return "Invalid input: alpha must be finite."
if alpha <= 0 or alpha >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Extract flat arrays
y_flat = np.array([row[0] for row in y])
groups_flat = np.array([row[0] for row in groups])
# Build DataFrame for fixed effects
# Transpose x: convert from list of rows to dict of columns
x_dict = {}
if fit_intercept:
x_dict['Intercept'] = [1.0] * n_obs
for col_idx in range(len(x[0])):
x_dict[col_idx] = [row[col_idx] for row in x]
x_df = pd.DataFrame(x_dict)
# Build random effects structure
if x_random is None:
# Random intercept only - use formula interface
exog_re = None
else:
xr_dict = {}
for col_idx in range(len(x_random[0])):
xr_dict[col_idx] = [row[col_idx] for row in x_random]
exog_re = pd.DataFrame(xr_dict)
# Fit the model
try:
with warnings.catch_warnings():
warnings.filterwarnings('ignore')
model = statsmodels_MixedLM(
endog=y_flat,
exog=x_df,
groups=groups_flat,
exog_re=exog_re,
missing='drop'
)
result = model.fit(reml=False)
except Exception as e:
return f"statsmodels.regression.mixed_linear_model.MixedLM error: {e}"
# Check for convergence issues
if hasattr(result, 'converged') and not result.converged:
return "statsmodels.regression.mixed_linear_model.MixedLM error: Model did not converge."
# Extract fixed effects results
fe_params = result.fe_params
bse = result.bse
tvalues = result.tvalues
pvalues = result.pvalues
conf_int = result.conf_int(alpha=alpha)
# Validate outputs don't contain NaN or inf
for val in [fe_params, bse, tvalues, pvalues]:
if hasattr(val, 'values'):
arr = val.values
else:
arr = np.array(val)
if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
return "statsmodels.regression.mixed_linear_model.MixedLM error: Model produced non-finite values."
output = []
# Section 1: Fixed effects (only the fixed parameters, not random variance)
output.append(['fixed_effects', 'parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper'])
# Add all fixed effect parameters
for i in range(len(fe_params)):
param_name = str(fe_params.index[i])
output.append([
'fixed_effects',
param_name,
float(fe_params.iloc[i]),
float(bse.iloc[i]),
float(tvalues.iloc[i]),
float(pvalues.iloc[i]),
float(conf_int.iloc[i, 0]),
float(conf_int.iloc[i, 1])
])
# Section 2: Random effects variance components (padded to 8 columns)
output.append(['random_effects', 'parameter', 'variance', 'std_dev', None, None, None, None])
# Extract random effects covariance (returns DataFrame)
cov_re = result.cov_re
# Check if it's a DataFrame or other structure
if hasattr(cov_re, 'values'):
cov_re_array = cov_re.values
# Check for NaN/inf in random effects
if np.any(np.isnan(cov_re_array)) or np.any(np.isinf(cov_re_array)):
return "statsmodels.regression.mixed_linear_model.MixedLM error: Random effects contain non-finite values."
# Get diagonal elements (variances)
if cov_re_array.ndim == 2:
for i in range(cov_re_array.shape[0]):
var = float(cov_re_array[i, i])
if i < len(cov_re.index):
param_name = str(cov_re.index[i])
else:
param_name = f'Random[{i}]'
output.append(['', param_name, var, math.sqrt(var), None, None, None, None])
elif cov_re_array.ndim == 0:
# Scalar case
var = float(cov_re_array)
output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
else:
# 1D array
var = float(cov_re_array[0])
output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
else:
# Fallback for other types
try:
var = float(cov_re)
if math.isnan(var) or math.isinf(var):
return "statsmodels.regression.mixed_linear_model.MixedLM error: Random effects contain non-finite values."
output.append(['', 'Group Var', var, math.sqrt(var), None, None, None, None])
except:
return "statsmodels.regression.mixed_linear_model.MixedLM error: Unable to extract random effects."
# Model statistics (padded to 8 columns)
llf = float(result.llf)
aic_val = float(result.aic)
bic_val = float(result.bic)
scale_val = float(result.scale)
# Validate model statistics
if math.isnan(llf) or math.isinf(llf):
return "statsmodels.regression.mixed_linear_model.MixedLM error: Log-likelihood is non-finite."
if math.isnan(aic_val) or math.isinf(aic_val):
return "statsmodels.regression.mixed_linear_model.MixedLM error: AIC is non-finite."
if math.isnan(bic_val) or math.isinf(bic_val):
return "statsmodels.regression.mixed_linear_model.MixedLM error: BIC is non-finite."
output.append(['log_likelihood', llf, None, None, None, None, None, None])
output.append(['aic', aic_val, None, None, None, None, None, None])
output.append(['bic', bic_val, None, None, None, None, None, None])
output.append(['scale', scale_val, None, None, None, None, None, None])
return output